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ABSTRACT 

Effects of assessment on curriculum and instruction 
were studied in an investigation that considered local educators* 
reactions to statewide minimum competency testing, the instructional 
effects of implementing these tests, and variations within each 
state. The two states examined were Pennsylvania, a low-stakes 
situation with relatively minor consequences for the student, and 
Maryland, a high-stakes situation where high school graduation 
depends on passing the tests. Pennsylvania students were tested in 
grades 3, 5, and 8, while Maryland students were tested beginning in 
grade 9. Fiel.dwork was conducted in six sites in each state, and over 
250 local educators and students were interviewed. A survey completed 
by 277 of 501 Pennsylvania districts and 23 of 24 Maryland districts 
provided additional information. The impact of the testing program 
was far greater in the high-stakes situation, and this impact seemed 
positive as long as the districts were not under too much pressure. 
Both positive and negative consequences are discussed. Five tables 
present study findings. (Contains 14 references.) (SLD) 
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TWO STATE MINIMUM COMPETENCY TESTING PROGRAMS AND 
THEIR EFFECTS ON* CURRICULUM AND INSTRUCTION 

Although close Co sixty percent of the states in this country have 

mandated some fonn of standardized testing (Marshall, 1987), debate continues 

about the local-level ir.pact of implementing such testing programs. The 

effects of assessment initiatives are not clear and have not been well 

informed by empirical research (Airasian, 1987; Rosenholtz, 1987; Stake, 

Bettridge, Metzer, & Switser, 1987). Little is known about how the curriculum 

and instruction are affected by statewide standardized testing; even less is 

known about how differences in state^ programs and school district 

characteristics magnify or minimize the effects. This chapter is an effort to 

address those issues. 

The study upon which the chapter is based belongs in the genre of 
research projects that examine assessment effects on local educational 
agencies (LEA) — that is, the study of the intended and unintended consequences 
for curriculum and instruction of implementing assessment programs. The study 
had three purposes: (1) to gather local educators' reactions to the 
initiation of statewide, mandatory minimum competency testing in their 
respective states; (2) to compare the instructional effects of implementing 
these testing programs on local school systems in two states; and (3) to 
explain district-to-district variations in effects within each state. 

In this chapter, findings related to the first purpose are presented in 
the form of a "Gallup Poll." Educators' responses to* selected, individual 
items from a questionnaire administered in the two states are reported. 
To address the second purpose, individual questionnaire items were combined 
into scales measuring various local system adjustments to facilitate 
between-state comparisons. Implementation effects were not uniform across the 



school systems within each of the two states and the third purpose of the 
study was to explain these differences. The remainder of this chapter 
describes the testing programs in two states, presents the conceptual 
framework that guided data collection and analysis, details the research 
methods used, and summarizes the results. 

The Testing Programs in Tvo States 

The two states represented "low stakes" (Pennsylvania) and "high stakes" 
(Maryland) situations. The level of the stakes associated with a test is the 
extent to which test performance is perceived by students, teachers, 
administrators, and/or parents to be "used to make important decisions that 
immediately and directly affect them" (Madaus, 1957:7). Relatively miner 
consequences attended student performance on Pennsylvania's minimum competenc; 
tests (MCT) in language and math. The purpose of both tests originally was t 
identify students needing additional classroom instruction who may not have 
been identified by other means. Maryland's "high stakes" strategy required 
students to pass reading, writing, math, and citizenship minimum competency 
tests in order to receive a high school diploma. The tests were being phased 
in as graduation requirements; at the time of the survey only the reading and 
math tests "counted." 

The two states' MCT programs had several important differences (see Tab! 
1). The first difference concerned the purposes detailed above. Second, 
Pennsylvania students took their tests in the third, fifth and eighth grades. 
Maryland tested students beginning in ninth grade, although a practice test 
was administered in the eighth grade. Third, the Pennsylvania state 
legislature made a special appropriation to fund remediation efforts, whereas 
Maryland offered no financial assistance for this purpose. Fourth, 



Pennsylvania's program was a legislative response to the calls for educational 
reform in the early 1980 f s and, after soliciting educators 1 input on 
appropriate test objectives, comaerical test publishers were invited to bid on 
a contract to develop the state's instrument. Maryland initiated a statewide 
curriculum improvement program several years prior to beginning the testing 
program with the expressed purpose of anticipating the instructional quality 
necessary to perform well on the tests. Educators from around the state were 
used by the SEA to provide input into the content and form of the tests. 

Table 1 

Summary of Two Mandatory, Minimum Competency, 
State Testing Program 



STATE 



PA 



MD 



Areas of Difference 
TEST CONTENT 

GRADES TESTED 

PARTICIPATION 
STATE FOCUS 



Reading, Matn 



3, 5, 



Mandatory 

Use of test results 
to identify students 
in need of additional 
instruction 



Reading, Math, 
Writing, Citizen- 
ship 

8 (Practice) 

9, 10-12 Retests 

Mandatory 

Identification o: 
failing students 
to aid districts 
in curriculum 
olanning 



LOCAL CONSEQUENCES 



Additional funds 
for low scoring 
students 



Students must 
pass test to 
graduate; LEAs 
required to pro- 
vide appropriate 
assistance to 
failing students 
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The programs 1 stakes changed during the study. In Pennsylvania, the 
£hie£ State School Officer (CSSO) released district rankings based on the test 
rior to the 1987-88 school year and touted the test as an appropriate 
indicator of school effectiveness. Study interviews conducted subsequent to 
this event revealed considerable concern on the part of local educators that 
the tests were being used in ways for which they were not originally intended, 
even though the rankings were quickly withdrawn due to the furor surrounding 
then. Regardless, the importance of the tests increased fcr both educators 
and the public. Maryland had.no similar dramatic event; instead its districts 
had tc reconcile themselves to the inevitable day when ail four tests would 
affect whether students graduated, with the two new tests generating 
considerable controversy and calls for revision. The difficulty students were 
having passing the two tests augmented the pressure on educators even under an 
already high stakes condition. 

Conceptual Framework 
The effects of introducing and operating a mandatory statewide testing 
program were expected to require adjustments in the local instructional 
program, organization, and culture. An underlying assumption of this study 
was that the mandatory testing programs had far-reaching ramifications for the 
technology, structure, and values in place in school systems depending upon 
what was at stake. ^See Figure 1.) This chapter looks at instructional 
adjustments—specifically at the strategies devised by a district to improve 
test scores and at modifications to curriculum and teaching intended to 
improve' the match between course and test content. The other two adjustment 
categories in Figure 1 are examined only to the extent that they help 
understand variations in the instructional adjustments. 




Whether or not adjustments actually occured was partially dependent on a 
least two aspects of a system's operating environment. Summarized in Figure 
1, these aspects were: (1) selected features of school district context, and 
(2) characteristics of the state testing program. 



SYSTEM 
ENVIRONMENT 

1. District Context 

• internal contextual and 
demographic characteristics 

• district-SEA relat ionshio 



2. State Testing Progra: 
• hish or low stakes 



Figure 1 



SYSTEM 
ADJUSTMENTS 

1 . Instructional 

• strategies 

• curriculum & 
instruction 



Organisational 
9 information f lo Y , 
o benchmark 



SYSTEM 

EFFECTIVES 

1. Student Foe 

• test scor 

• dropout r 

• attendanc 

• post-schc 

2. Teacher Foe 

• job satis 

• ccm&itmen 

^ »i M <— — — »— <~ . . 



3. Cultural 

* quality of work life 

• quality of student life 



With respect to school district context, years of research cn educaticna 
change point to an inescapable conclusion: some programs work seme times in 
some places, and it is mostly the time and the place that explain the fate of 
a program (Herman, 1981; Corbett, Dawson, & Firestone, 19S4) . Both Elmore 
(1930) and Berman (1981) argue that policy implementation can only be 
understood in terms of the context of the "target's" setting; policy makers' 
intentions become diffused and redirected as they pass through the prism of 
local politics, organization and culture. Thus, changes in the test scores 
over time were assumed to be the product of the complex interaction among 
system demographic and internal contextual characteristics, its relationship 
with the external environment — particularly the state eudcation agency (SEA) , 
and the kinds of adjustments the system made to implement the tests. 
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Features of the state testing program also vould influence the type and 
magnitude of local system adjustments that were made. The essential ■ 
difference in this study was that the program in Maryland cade graduation fro 
high school dependent upon a student's passing writing, citizenship, reading, 
and math tests. In Pennsylvania,, the test was intended formally to serve as 
tool for fine-tuning classroom instruction to meet certain students V needs . 
Thus, the study compared Maryland's high-stakes program having consequences 
for graduation to Pennsylvania's low-stakes MCI testing program. According t 
Madaus (1987) , "high-stakes" programs are used for important decisions and 
thus t have the power to modify local behavior; "low-stakes" programs are 
generally not anticipated to be central to decision-making, and test 
performance usually does not stimulate significant rewards or sanctions. The 
two states were selected for study tc accentuate this high-stakes/low-stakes 
distinction. 

There are several reasons why higher stakes situations can be expected t 
have greater local impacts. First, mandatory tests are likely to force 
adjustments in a system by creating expectations for what the outcomes of 
schooling should be. According to Mintzberg (1983), stipulating outcomes is 
one means used widely in organizations to affect operations. Some standard — 
no matter how narrowly defined — is to be met, regardless of what else staff 
members may want to accomplish. 

Second, one of education's fundamental tasks is to move students smooth! 
through a series of grades to graduation (Schlechty, 1976). Staff 
responsibilities, the number of classrooms needed, and the availability of 
sufficient materials are all predicated in most communities on the assumption 
that most first graders will become second graders and chat most seniors will 
graduate on time. A few exceptions cause no problems, but testing programs 
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change the assumptions by inserting an unpredictable checkpoint for 
determining progress for all students that is based on something other than 
student age, credits obtained, or time spent in school. 

Third, establishing a standard all students must meet as a visible 
indicator of effectiveness runs counter to the ethos of many educators 
(Rosenholtz, 1987). In spite of enormous standardization, a tone of 
individualism permeates American education (Lortie, 1975). Teachers are 
allowed considerable autonomy in determining what and hew to teach, and they 
expect to handle their classrooms on their own. Testing programs, therefore, 
challenge an ingrained ethos concerning curriculum and instruction decisions. 
Test items highlight critical concent to cover, test administration dates 
determine the deadline for teaching the content, item formats affect hew the 
information will be accessed, and the standards add a quality of sameness to 
what students should achieve. 

The conceptual framework (Figure 1) also points to an additional 
important question: Have the instructional adjustments made the district mci 
effective? Narrowly conceived, this question merely suggests an examination 
of a district's success in helping students meet the standards set by the 
test. However, it is becoming more and more clear that definitions of 
effectiveness and the extent to which they are shared are context dependent 
(Rossman, Corbett, & Firestone, 193S) . Effectiveness, thus, may be defined 
more by how well a system prevents dropouts, improves attendance, stimulates 
student enthusiasm for learning, or addresses student differences than by 
doing better on a test. A study of this magnitude is not an appropriate 
vehicle for answering this question. While the study does tap perceptions o 
a district's reach for improvement, its major focus is on explaining system 
adjustments, not the ultimate effectiveness of the testing programs. 



Study Design 

s. 

The conceptual framework simplifies a very complex situation. 
Introducing and operating a mandatory statewide testing program involves a 
wide range of potential challenges to a district. Many of these can be 
deduced from a conceptual framework such as the one above. However, 
using an inductive approach in which the research car. take advantage of 
unexpected developments can be equally valuable (Miles & Kuberman, 198*) . For 
this reason, the study was designed to include both open-ended qualitative 
fielcwork and structured questionnaires. 

The study was conducted in three phases. First, a preliminary round of 
qualitative fielcwork was performed wherein researchers visited each of 12 
school districts for 'several days to interview a wide variety of staff 
members. Second, the results from the interviews were used to design a 
questionnaire to be administered throughout districts in the states studied. 
Third, the survey results were used to structure a f inal , round of feedback and 
interviews in the sites originally visited. 
Phase One; Fieldwork in 12 sites 

Six sites in each of the two states were visited. Site selection was 
made on the basis of district size and type of community ser/ed, primarily 
because these characteristics were assumed to determine the kind of staff 
resource demands implementing the test would make. Equally important was the 
willingness of the district to participate because the purpose of this phase 
was to explore issues in depth, not to generalize to a larger population. 
Selection was carried out with the input and assistance 'of key SEA staff 
members in each state. 
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Six experienced field researchers conducted the site visits. One 
researcher spent two or three days in each site depending on district size. 
The first day was spent in the central office, interviewing the superintended 
(if available), the person(s) responsible for handling the testing program, 
and other district staff members who dealt with the test. Also, pertinent 
documents were examined where available. School interviews were conducted- 
with administrators, guidance counselors, teachers, and students. When, all ■ 
appropriate schools in a district could not be visited, selection was mace in 
collaboration with district personnel. Sampling a variety of schools in the 
district was the foremost criterion. Over 250 local educators and students 
participated in the interviews. 

Interview Questions . Field researchers operated from interview guides 
with broad categories of questions*. Specific phrasing of questions and the 
particular probes used were determined by the researcher on site. In trainin 
sessions conducted prior to the site visit, researchers had an opportunity tc 
generate and discuss potential questions and follow-up prcbes, but fieldwork 
of this type demands that the researcher have considerable flexibility in 
determining who to talk to, what to ask, and when to ask it. The goal was to 
obtain data on each category from multiple sources but not necessarily from 
every source* 

Data Management . A multiple-case, multiple-researcher, open-ended 
interview study places a heavy burden on the data management system* A 
systematic way of determining data gaps, locating overlooked sources, making 
data accessible to other researchers, and being able to retrieve parts of the 
data was imperative. .To accomplish this, resources were allocated more to 

^"For further documentation of interview protocols and data summaries the 
reader is referred to Corbett. and Wilson (L987). 
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developing data summaries than to making handwritten field notes presentable 
or typing transcripts from tape recordings- When researchers returned from a 
site visit, they completed a series of data summary charts: (1) a summary of 
information sources and the question categories for which each source supplie 
information; (2) a description of source-identified effects coupled with the 
researcher ! s designation of which and how many staff members listed each 
effect; (3) a summary cf data cn the district's instructional, organizational 
and cultural contexts as well as its relationship with the surrounding 
community and the SEA; and (£) a listing of residual incidents and data worth 
of note that did not fit cleanly in the structured charts. 

These data summary charts were used by the authors to conduct the 
cross-site analysis and they were the stimulus for determining whether 
additional information needed to be gathered from particular sites. 

Data Analysis. The analysis activities consisted of reviewing the 
data summary charts to identify implementation themes that cut across the 
12 sites. The specific goal of the analysis was to develop items for the 
questionnaire to be used in the second phase of the study. 

Seven themes emerged from the researchers 1 extensive review of the data 
summary charts. These were: 

• Few staff quarreled vehemently with the appropriateness of a 
statewide test. "We need something like this 11 was a frequent 
refrain. 

• At the same time, the tests' information was viewed as generally 
redundant in most districts, especially the suburban ones. 

o "Teaching to the test !t was a major concern and acknowledged as 
the most expedient means of trying to improve test scores. 
Perceptions about the "propriety 1 * of the practice varied. 
Probably most heard was: "I don't believe in it, but we have to 
do it to get scores up". 
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• Scarf members from districts or schools that did well on the 
tests were less unhappy about the program. Essentially they 
were pleased that the* test scores gave the public confirmation 
of the gccd job they knew they were already doing. 

© Socio-economic status of the community and community attitudes 
toward education were generally viewed as being major 
determinants of test results. 

c Wide dissatisfaction with administration of the program and the 
valdity of the tests was expressed. 

• Numerous issues emerged that were clearly state-specific. For 
e::ample, Pennsylvania districts liked the "no strings" money 
from* the state"; Maryland districts devoted considerable 
attention to documentation to protect themselves against 
"probable 11 lawsuits. 

The authors returned to the original field notes to review the 
terminology local educators used in discussing the tests. Using the 
conceptual framework, list of ther.es, data summary chart information, and this 
review of responses, individual questionnaire items were constructed. The 
items fell into five categories: local internal and external operating 
contexts, the administration of the tests in the local setting, the strategies 
used to maximize student performance, the purposes the tests were used for in 
the local setting, and the impact of the tests on instruction, organization, 
and culture. A questionnaire with 83 items was produced from this synthesis. 
Phase Two: Survey Design 

The second phase of the study involved a quantitative assessment of 
the local ramifications of mandatory statewide testing programs. Four major 
activities—instrumentation, sampling, data collection and analysis—were 
conducted during this phase. 

A first draft of the questionnaire was designed that could be 
self-administered in 20 to 30 minutes. A pilot test of the draft instrument 
was conducted in several districts to ensure that the questionnaire was clear, 
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communicated the intent of the project, and could be completed within tine 
constraints. Changes to the questionnaire were cade on the basis of the 
criticism that was offered. 

All districts in both states were invited to participate in the study 
(Pennsylvania = 501; Maryland - 2^). Three different role groups familiar 
with the testing program were targeted for each district: central office 
administrators, principals, and teachers. A separate questionnaire was 
completed by each role group member. In Maryland, where there were fewer but 
larger school districts, three respondents from each role group within the 
district were asked to complete the survey. Only one person from each role 
group within the district completed the survey in Pennsylvania. The 
participating staff members in each system were selected by the superintendent 
cr a designee. 

In Pennsylvania, 277 of the 501 districts responded with one respondent 
for each of three role groups (central office, principal, and teacher). In 
Maryland, 23 of the 24 districts returned useable questionnaires with three 
respondents for each of three role groups. An analysis of the participating 
and non-participating districts in Pennsylvania showed no significant 
differences between the two groups in terms of basic demographic 
characteristics (e.g. size, wealth, location) . 

The analysis had three foci. The first was to identify educators' 
responses concerning the adjustments they had made. Frequency distributions 
for questionnaire items were used to display these responses. The second 
fccus was to examine cross-state differences for instructional adjustments. 
Analyses of variance were conducted to compare responses in the two states. 
The third was to examine within-state district variations fcr adjustments made 
to curricula and instruction. Multiple regression techniques were used to 
assess the contribution of multiple variables to these adjustments. 




Phase Three: Follow-up Fieldvork 

In the fall of 1987, field researchers returned to 11 of the original 12 
sites visited in Phase One, with one Maryland district declining to 
participate. The purposes of these visits were to trace subsequent 
developments in the operation of the state testing program and to obtain 
assistance in interpreting the results of the survey. Over 80 local educator 
participated in this activity. The interviews concentrated on the findings 
contained in the section on within-state district variations. The findings 
were presented to participants and they then reacted to specific numbers , 
interpretations, and implications. These reactions then were incorporated 
into the quantitative results section of this chapter. 

Find ings Regarding Educators 1 Reactions to Statewide Tests 
This section gives a flavor of how educators felt about their respective 
states 1 program and hints at important differences between the two states as 
well as important variations within each state. The specific focus for this 
chapter is on items related to curriculum and instruction, and particular 
attention is paid to district strategies used to improve the MCT scores and t 
alterations in course content and instructional activities made to match test 
objectives. In addition, two items that address whether the curriculum had 
narrowed or improved are also discussed. 

The cluster of items concerning the local strategies provided an estimat 
of the intensity of a system 1 s instructional effort to improve the test 
scores.* Items in this cluster assessed how true each of these statements 
were: 

• Students take a practice test at some point before they take the 
actual [state] test. 
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• Content: and skills covered in the [state] test are reviewed just prior 
to test administration . 

• The district has provided assistance (e.g. in staff meetings, 
in-service sessions, and other activities) to help staff identify ways 
to improve [state test] scores. 

• Staff development resources have beer, allocated to [state test] 
related activities. 

• Special effort has been put into working with the schools in the 
district where [state test] scores have been lover. 

• The entire district is making an all-out intentional effort to improve 
its [state test] scores.. 

Curriculum and instruction alterations included items .related to the 
extent of adjustments cade in course content and teaching practices. Four 
items concerned how often the test was used for the following purposes: 

• To identify instructional objectives/content already being addressed 
in the curriculum that were, in need of greater emphasis . 

• To identify previously unaddressed instructional objectives/content 
that need to be added to the curriculum. 

• To determine student placement in instructional groups within a class. 

• To determine student placement in homogeneously grouped classes or 
courses . 

Four items concerned the magnitude of change: 

• Teachers have altered the content of their classes. 

• Teachers have 7 * adopted new instructional approaches. 

• Staff members have been introduced to important new instructional 
ideas. 

• Basic skills instruction has spread tnroughout the curriculum. 

14 .16 



Two single items explored staff perceptions of the magnitude of the 
direction of the changes: 

• The curriculum has been narrowed. 

• The curriculum has improved. 

Frequency distributions for the respondents in each state are presented 
in Table 2. These comparisons combine the responses from all three role 
groups that completed the survey — teachers, building principals, and central 
office administrators. The numbers in the table represent the percent of 
educators responding to each category. 

The findings with respect to the six items focusing cn the intensity of 
district strategies to improve test scores were consistent in showing 
significant variation between educators' views in Pennsylvania (the low stakes 
state) and Maryland (the high stakes state). Educators in Pennsylvania 
reported almost no use of practice tests or content review just prior to test 
administration whereas the opposite was true in Maryland. The other 
district-wide strategies (using staff development resources, working with 
low-achieving schools, etc.) were much more likely to occur in Maryland than 
Pennsylvania. 

It should' also be noted that there was considerable variation in 
educators r responses within each state. For example, in Pennsylvania the 
question regarding district assistance to help staff identify ways to increase 
test scores produced a mix of responses with anywhere from one-eighth to just 
over one-quarter of the respondents answering each of the five response 
categories . 

The pattern of responses is similar for the next eight items that address 
adjustments made in course content and instructional practices. The educators 
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in the high stakes state (Maryland) reported more alterations than the 
Pennsylvania respondents on the items dealing with changes to class content, 
new instructional approaches, exposure to new ideas, and the spreading of 
basic skills instruction. Differences between the two states were not 
pronounced on the items concerning objectives and student placement. As with 
the strategies items, there was wide variation within each state; substantial 
proportions of respondents selected almost all the response choices. 

With respect to the item cn curriculum improvement , Pennsylvania 
educators indicted only a "minor change" while educators in Maryland indicated 
the change was "moderate' 1 . In foilowup interviews conducted curing Phase 
Three of the study, it was clear that "improved" was interpreted in very 
specific ways. Some of the more frequent ■ adjectives used by educators in 
Maryland, in place of "improved" included "structured, coordinated, more 
focused, more defined, sequentially ordered, more systematic, consistent, and 
created a consciousness (about what was being taught)." All of these referred 
to a tightening up of curricular content. What was missing was any judgment 
about whether the system was better off. 

With respect to narrowing of the curriculum, there were marked 
differences in response between educators in the two states. In Pennsylvania, 
approximately two thirds of the respondents indicated there was no change with 
respect to curriculum narrowing. On the other hand, in Maryland only one of 
seven respondents indicated no change; two thirds of them reported a moderate 
to total change. 

The above findings offer a snapshot of local educator's reactions to the 
initiation of statewide mandatory minimum competency tests. The item level 
findings hint at important differences between the two states. They also 




suggest a great deal of district-to-district variation within each state. 
Each of these two issues is addressed in more detail in analyses presented in 
the next two sections. 

Findings Regarding a Comparison of Testing Programs in Two States 

Clearly, Maryland's program should have had a greater impact on its local 

systems than Pennsylvania's program, primarily because Maryland f s policy 

insinuated itself into an important organizational event — graduation — and 

because preceding statewide improvement and actual test development activities 

engendered a cummulative anticipation of the day the tests would be put into 

place. On the other hand, Pennsylvania's program arose from dialogue limited 

mostly to state level legislators and officials. Limited local knowledge 

about the program plus its lack of implications for school operation seemed to 

insure that the test would have little impact beyond its stated purpose as a 

means to help schools identify students in need of additional instruction. 

The results in Table 3 assess the differences between the two states 1 

respondents. A mean score, for each respondent was computed by combining the 

six "strategies" items into one scale and the eight "curriculum and 

instruction adjustment scores" into another. The curriculum improvement and 

narrowing items were treated as single items. An analysis of variance was 

2 

conducted on the two scales and the two single items. 



2 

Prior to comoining these items to create a scale, statistical tests 
were conducted to ensure the appropriateness of such a step. First, 
correlation matrices were examined to check that there was at least a moderate 
correlation for the combined items and that there were not any excessively 
high correlations. Second, an analysis of reliability (internal consistency) 
was conducted to test that the items cohered together. The results of those 
calculations produced a coefficient of .76 for strategies and .82 for 
curriculum and instruction adjustments, suggesting high internal consistency. 
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Table 3: Analysis of Variance Comparison of Curriculum 
and Instruction Scores by State 
(N-1019) 



Mean 



Cluster PA MD ■ F Scale 



Strategies 


3. 


10 


4. 


44 


393.4* 


1.00 


to 


5 


.CO 


Curriculum and 










* 










Instruction Adjustments 


1. 


94 


2. 


75 


148.7 


0.00 


to 


4 


.50 


Curriculum Improvement 


I 


25 


1 


54 


X 

12 . 2 


0.00 


to 


L 


.00 


Curriculum Narrowing 


0 


42 


1 


S3 


44S.4 


0.00 


to 


4 


.00 



^Indicates- significance veil beyond the .001 level. 

The findings were striking and consistent. For all four variables, 
statistically significant differences between the states vera found. 
Maryland school districts focused more directly cn improving test scores, 
altered the curriculum to a greater extent, reported core improvement in the 
curriculum, and felt the curriculum had narrowed more than their Pennsylvania 
colleagues. In the case of the strategies employed, the mean in Pennsylvania 
was at the middle of the five point scale whereas in Maryland it was only a 
half point below the high end. This indicated a high level of attention to 
improving the scores in Maryland in absolute terms as well as in comparison to 
Pennsylvania, With respect to curriculum and instruction adjustments, the 
difference was that between a change of minor magnitude in Pennsylvania and a 
change of slightly less than moderate magnitude in Maryland, Finally, in 
Maryland there was a much stronger feeling that the state mandated testing 
program had narrowed and yet improved the curriculum. 
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Essentially, Che tvo states had different intentions in mind when the 
testing programs were initiated and the study data indicate that both were 
being met, Pennsylvania wanted to increase the visibility of students who may 
have been in need of additional instruction and originally had not expressed 
interest in drastically revamping school programs. Maryland very consciously 
wanted to affect the curriculum— first through a planned improvement process 
and then via the graduation tests. These data'refiect the differences in the ^ 
modest versus the mere ambitious approaches. 

Recent Developments in the Two States: Raising the Stakes 
The above comparisons present a snapshot of the differences in educators 1 
reactions to the testing programs. The picture was taken in the late Fall of 
1986 and the early Winter of 1987. Events in both states subsequent to the 
survey seemed to increase the level of the stakes associated with the tests 
and had an effect on staff sufficient to alter the responses mace on the 
questionnaire. In both states, an increase in the number of adjustments made 
in curriculum and instruction and an intensification of the strategies used to 
improve scores were notable. A detailed account of these changes is available 
in Corbett and Wilson (1988). 

The key event in Pennsylvania was the publication of the results from the 
spring of 1987 test administration. Rather than the customary low-key sending 
of the scores to districts for each to handle as it saw fit, the event was 
orchestrated by the Chief State School Officer (CSSO) . In a public media 
briefing, the CSSO provided documents that ranked districts in the state from 
top to bottom in terms of the percentage of students who passed the cut-off 
point. In addition, a subpopulation of schools that had achieved 100 percent 
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passing rates despite a "high risk" student population was singled out- as 

being "poised on the brink of excellence." And to cap off the presentation, 

the CSSO touted the tests as the best measure available to assess the 

effectiveness of Pennsylvania's schools. An immediate protest to this use of 

the score? arose from educators across the state and resulted in the 

withdrawal of the documents containing the rankings. 

The withdrawal of the rankings did not strike the event from either 

educators 1 or their communities f emotional record. Educators in three cf the 

six Pennsylvania districts visited in Phase Three argued that the "game" had 

now changed in their systems: 

The purpose of the test changed in September. It is no longer for 
remediation but to rank order schools, [superintendent] 

The results should be between the state and the school district if 
the test is to help. When they release scores and say 53 kids need 
help, we can say we've already identified of them. But the 
negativism starts; it starts [phone] calls and there is no question 
I now have pressure on me. [.superintendent] 

The test was not all that important .... But we might as well race up 

to it; with the publication of school by school results one of 

the goals will be to raise the percentage above the cut score, 
[assistant superintendent] 

What really seemed to be changing for the three districts in Pennsylvani 

were the stakes; they got higher, primarily through the increased visibility 

of score comparisons and the subsequent increased, albeit reluctant, 

acceptance of the scores as a benchmark — that is, as a widely recognized poir. 

of reference when discussing the performance of schools in the district and i 

surrounding districts. Staff in the three districts reported that they did 

not believe the tests to be particularly important educationally and did not 

embrace the tests as valid indicators of attainment. Thej nevertheless 

acknowledged that they already were, or would soon be, treating the scores 

more seriously than in previous years. 
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This is best illustrated by a district whose surrounding districts 

performed similar iy on the MCT, even though the district felt that its 

carefully and systematically developed curriculum far surpassed the offerings 

of their neighbors. The response: 

We don't believe in the tests that strongly but we will be forced t 
see all material is covered before the tests. We definitely are 
going to do it. We won't be caught in the newspapers again, 
[superintendent] 

The brunt of not "getting caught" again was to be borne by the reading 

program — a recently revised, developmental curriculum. The timing of the tes 

administration required shifting the sequence of topics to be covered. An 

outraged reading coordinator responded, 

You have to alter a curriculum that is already working well and so 
we can't follow the developmental process. Kids are already growin 
in a structured program; but it (pressure to change] c ernes from the 
board, community, and adverse publicity. 

The superintendent empathized with the coordinator, 

I don't have much faith in the tests. I don't want- to change the 
curriculum, and it's not a major revision, but we've got to do 
better. Still, it's not the right thing to do to anyone. I don't 
want to over-react but I'm also going to have to spend time- on 
things I shouldn't have to do as well: public relations, testing 
meetings — just to make the board feel comfortable. It'll never 
happen again when we see a worse district doing better than us. 

The interviews suggest that these districts were planning expedient 

strategies to improve the test scores and just as clearly there was resentmen 

to do so and a concern that what they were doing was compromising some 

standard of good professional practice. The message they were giving was tha 

their test scores were becoming benchmarks for political reasons, namely to 

appease school boards and community members who had had the opportunity to se 

their school 'systems compared to neighboring districts and did not like what 



they saw. And no matter how district staff had portrayed their performance i 



the past, part of that protrayal in the future had to include the test. scores. 
Staff, in other words, were beginning to use the tests as a reference for 
judging local effectiveness. ' This development reflected obligation more than 
acceptance . 

Maryland districts seemed to be sharpening the focus of the strategies tc 
■improve scores, resulting in augmented pressure on teachers to get students tc 
pass . Mo single event dramatically heightened the impact of the tests. 
Instead, the stimulus was the approach of the time when students had to pass 
all four of the tests in order to receive a diploma. 

In Maryland, the four tests were not regarded equally. Phase Three 
interviews revealed that educators discriminated between the reading and math 
tests on one hand and the writing and citizenship ones on the other. The 
reading and math tests, in Maryland educators 1 minds, were adequate measures 
of basic competence in the respective content areas and covered objectives 
already well-entrenched in the curriculum. The curriculum development: aspect 
of the state initiative began in the late seventies, and these two tests were 
the first to be developed, trial-tested, and implemented. Actual local 
curriculum and instruction changes had been in place for seven to nine years 
in some settings. 3y 1937, tries* 7 : alterations had become institutionalized to 
the point that interview subjects in four of the five Phase Three districts 
argued that the mean score for curriculum and instruction adjustments may have 
been too low because staff had forgotten that what was now routine was once 
novel. The result was that, the two tests were no longer intrusive. 

Such was not the case for the writing and citizenship tests. Both 
generated considerable controversy. The writing test did so primarily because 
staff viewed it as demanding a performance level well beycnd that necessary to 
be minimally competent in writing. The citizenship test's controversial 
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aspect centered around its requirement that students memorize information 

about local, state, and federal governments — information that even the 

teachers said they did not possess without special study. Fueling educators 1 

concerns were the facts that students had much more difficulty succeeding on 

these two tests and that the time when the first cohort of students would have 

to pass all four tests to receive a diploma was inexorably approaching. For 

administrators, teachers with responsibilities in certain grades and in 

certain content areas, and special fcfccation teachers, the pressure to achieve 

passing scores was building and the impact on their work lives was great. 

We've changed the whole social studies curriculum. \\2 had to expand 
the 7th and 8th grade American Studies to include more history (to 
make up for content not being taught later) and new teach government 
in the last term of 7th and 8th grades which ve did not teach at all 
as a separate entity in the past. And we have structured in key 
points in the language arts scope and sequence. [central ^ ice 
administrator] 

It depends on who the teacher is and what the teacher teaches. You 
can't have a bigger impact than on sequence or inserting a new 
course. We new offer courses not included before and content that 
changed from 10th to the 9th grades. With government, the impact is 
overwhelming, [central office administrator] 

As illustrated in the above quote, there was a "differentiated" impact cf 
implementing the tests. Some parts of the system were affected little while 
others felt considerable ramifications. Such a situation caused statistical 
measures of central tendency such as the mean scores presented above to 
disguise this important impact of the tests. 

The "discomfort" of subgroups of staff involved with the two controver- 
sial tests focused their attention more and more on the percentage of students 
passing the tests and on adopting expedient methods of improving scores. This 
"concentrated" approach, was apparent in all five systems where Phase Three 
interviews were conducted. 
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We are concentrating more on basics. We are now spending from 
September to November on basic skills rather than on our 
developmental program* [reading teacher] 



I f m not opposed to the idea of testing. But I'm not so sure we 
haven f t gone overboard, the tail is wagging the dcg. The original 
idea was that there were to be certain standards the student 'would 
have to meet, but if the student doesn't pass, people will ask 
what's wrong within the school and teachers, [teacher] 

When the scores are low, it takes me into the school for the names 
of the kids who failed. There is no stroking in schools where 
scores have dropped. Everyone is sitting around with bated breath 
waiting for the test scores. [central office administrator] 

We realize a kid is taken out of science every other day for 
citizenship and will fail science to maybe pass the citizenship 
test. [building administrator] 

These very targetted means for getting students to pass were acknowledged as a 
necessary evil: 

We've had to do things we didn't want to do. [central office 
administrator] 

We have materials provided by the county as 'quick help.' We were 

told 'here's how to get kids to pass the test fast.' They were gccc 

ideas but specifically on the test, rcr example, if the area in a 
rectangle is shaded, ycu multiply; if not, you add. [teacher] 

And in response to the above stream of comments, a teacher summarized, 

Talk about games and game-playing! 

The above comments suggest that the means for both the strategies and 

curriculum and instruction adjustment scales in both states would increase if 

the questionnaire were readministered. It is important to note that the 

stakes were raised in the two states for two different reasons: (1) public 

pressure to improve test scores that resulted from readily available 

comparisons of performance in Pennsylvania, and (2) the proximity of both the 

yearly test administration day and the day when the two troublesome tests 

would actually serve as an obstacle to graduation in Maryland. Interestingly, 



ERLC 



25 



the stakes increased in what were originally both low and high stakes 
situations. As they did so, educators' concern shifted almost completely to 
influencing test performance. Put differently, the manifestations of the 
seriousness with which the test was taken shifted. The shift can best be 
described as a shift from a long-term focus to a short-term one, from using 
the tests as one indicator among many to treating the next set of test results 
as the most important outcome of schooling. 

Results with Reference to District Comparisons 
The interplay of local setting, state context, and policy are more likely 
to yield variations in implementation than consistency. Such was the case in 
the two states examined in this study. This section explores the issue of 
the differential impact of the testing program within a state. In other 
words, what were the differences among local districts within a state that 
influenced the particular instructional adjustments a district mace in 
response to the testing programs? 

To explain variation in the intensity of district strategies to improve 
test scores and adjustments made to course content and teaching practices, 
responses from a single central office informant for each district were used. 
That informant was typically either the superintendent or the staff person 
most familiar with the' state's testing program. It was felt that central 
office administrators were in a better position to be informants at the system 
level than teachers or building principals. 3ecause multivari ate statistical 
techniques offered the best method for partialing out the independent effects 
of several variables and because there were only 23 districts available in 
Maryland, the analyses in this section were done only with the Pennsylvania 
subsample (N=2 7 7) . 
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The descriptive statistics presented in Table 4 summarize the variation 
in instructional adjustments across Pennsylvania districts. These scores 
represent the strategies and curriculum/ instruction scale scores; they 
highlight the diversity of responses reported by local educators in 
Pennsylvania. 

Table 4 

Descriptive Statistics for Adjustment 
Variables in Pennsylvania (X-277) 

Instructional Standard Observed Theoretical 

Adjustment Mean Deviation Range Range . 

Strategies 3.09 0.78 1.00 to 5.CC 1.00 to 5.00 

Curriculum/Instruction 1.9^ 0.76 0 to 3.63 0.00 to ^.50 

In response tc a question concerning the accuracy of the zeans , local 

educators who participated in the feedback sessions generally agreed with 

their accuracy* for last year . However, the developments regarding the public 

ranking of schools and the CSSO's increased emphasis cn the test scores made 

them think that both means would be higher if a later survey were conducted. 

Evidence supporting this contention was presented in the "Recant Developments 

section above. 

Using the conceptual framework presented' in Figure 1, three categories o 
variables were selected that night explain the level of these adjustments: 

• internal environment (e.g. percent white, SES , site) 

• state environment (i.e. political climate) 

o 

o other district adjustments*" (e.g. MCT used as benchmark, informati 
flow) 



"'For the purposes of these analyses the two organizational adjustment 
variables (MCT as a benchmark and the information flow) and the one 
instructional adjustment variable not considered as the dependent variable 
have been included as the last set of independent variables in the regression 
The adjustment variables are ail scales. More detailed documentation 
concerning construction of these scales is available in Corbett & Wilson 
(1937). 
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Based on discussions with district staff during the Phase Three 
interviews, it was decided to add a fourth category: 

• MCT program characteristics (see Tables 5 and 6 for individual 
items) 

The four categories included a mix of individual survey items and 
scales. As a first step in the analysis, simple bivariate correlation 
coefficients were examined to explore the relationship of these variables vi 
strategies for improving test scores and the adjustments to curriculum and 
instruction. 4 As a second step, regression equations were calculated using 
the four categories of variables. The first group of variables enterec 1 into 
the regression equations were internal environment measures. Subsequent 
equations added one group cf variables at a time until all four categories o 
variables were entered. 

Strategies. Table 5 summarizes the results of the regression estimates 
cf the four categories of variables on the intensity of instructional 
strategies to improve te~t scores. The first column of numbers indicate the 
standardized Beta coefficients for effects of internal environment variables 
The estimates indicate that there is a negative association between SES and 
the intensity of the strategies. That is, the lower the districts SES 
(measured by the percent of students in a district eligible for free lunch), 
the more likely the district was to engage in strategies to improve test 
scores. Also, the higher the percentage of students passing the reading 
portion of the MCT in the previous year, the less likely a district was to 
adopt strategies to improve test scores. (The opposite relationship was 



Only variables with significant 'bivariate relationship (? £ .05) were 
included in the second phase. A few additional variables were excluded 
because of the high number of missing cases. 
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Table 5 

Standardized Regression Coefficients for 
Strategies with Incremental Addition of 
Indeoendent Variables (N-186) 



Independent 
Variable 


(1) 


(2) 


(3) 


(4) 


(1) Internal Environment 


-.183 


-.118 


-.084 


-.0^3 


• PERCENT rAbol^u 


xx 

-.267 


xx 

-.254 


** 

-.271 


x*^ 

-.266 


• PERCENT PASSING 


.176 


* 

.215 


.222 


X X 

.223 


(2) State Environment 
•■ POLITICAL CLIMATE 




*** 

.323 


Ax 

.221 


. 124 


(3) MCT Program Character! 


sties 








• MCT ACCURATELY PROTRAYS 
PERFORMANCE 






-.019 


-.044 


• VARIETY OF REMEDIATION 






.069 




ALTERNATIVES 






.026 


• MCT FAILURES RECEIVE 
REMEDIATION 






.100 


.053 


• MCT EXIT CRITERIA FOR 
REMEDIATION 






-.024 


.035 



• DISTRICT PERSON TO COORDINATE 
MCT 

(4) Other District Adjustments 

• C & I 

• INFORMATION FLOW 

• MCT AS COMPARATIVE BENCHMARK 



.10 



.336 



.296 



,188, 
.152; 
.129' 



,20 



35 



B." increment (from 
previous model) 



(.10) 



(.15) 



(.11) 



** P ^ - 05 
*** P ^ - 0L 
P < - 001 

NOTE: The sample for the regression is smaller than the full sample because 
of missing data for some of the variables. 
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observed with respect to the percentage -of students passing the math 

test—an anomaly we cannot explain.) The amount of variation accounted for by 

these variables was only 10 percent. 

2 

The addition of the political climate variable doubled the R . Unere 
there was a positive political climate between the district and the SEA, mere 
strategies were used to improve test scores. The addition of this state 
environment variable reduced the SES contribution to a nonsignificant level, 
although the relationships between the strategies adopted and the percentage 
of students passing the reading and math tests remained significant. 

A number of MCT program characteristics revealed significant bivariate 
association with the intensity of the strategies. However, when controlling 
for the effects of the other variables, only one—whether the district 
appointed a person to coordinate MCT activities-had a significant impact on 
strategies. Districts with specially appointed personnel to offer MCT-related 
staff development activities were more likely to adopt specific strategies 
for improving test scores. This was the strongest finding in the regression 
analvsis. It seems logical since the primary role of such a person was to 
work directly with district staff to carry out the activities indicated in the 
strategies cluster (e.g. use of practice tests or developing special 
resources). This category of variables, MCT program characteristics, added 
considerably to the explained variation with an increase from 20 percent to 35 
percent . 

The fourth category, other district adjustments, added an additional 11 
percent to the explained variation bringing the total to just under 50 
percent. These findings indicate that where discussion of information about 
the MCT was more frequent, where MCT test results were more frequently used as 
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a benchmark for assessing district performance, and where more adjustments 
were made to the curriculum, the more intense the strategies to improve test 
results were. 

The Phase Three interviews confirmed and elaborated many of these 
findings. First, informants were quick to point out the strategies measure 
underestimated the current situation. Since the intervention by the CSSO and 
the public advertising of scores, strategies had intensified even more. As 
one superintendent noted without equivocation: "We will raise test scores." 
District personnel suggested that inservice topics had begun to appear that 
offered teachers simple, practical tips on how to improve students' chances of 
success. District staff reported less hesitancy to use drill and practice in 
weak areas and one teacher reported quite forthright ly : "Teachers have been 
told to teach to the test." While many abhored such practices, they were even 
more concerned apout the public consequences if the\ did not. 

Another strategy discussed by several districts was the use of threats. 
The argument was that to affect students it was necessary to threaten them 
with something that was important. Suggesting that students who did not pass 
the test would be taken out of study hall and placed in remediation was enough 
to motivate many of them. One administrator estimated that 30 percent of 
those who failed the test the first time passed the second time solely by 
raising the anxiety level. 

Curriculum and Instruction (C&I) Adjustments * Table 6 presents the 
results of the regression estimates for CSI adjustments. Vnen only internal 
environment variables are included in the regression equation, both district 
size and community SES were related to C&I adjustments. That is, smaller 
districts and districts with poor families were more likely to have staff who 
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reported greater CSI adjustments. When ••additional variables were added to the 
equation, size was the only variable that continued to be statistically 
related to C&I adjustments. 

The proportion of explained- variance increased dramatically when the 
state environment variable was added (R 2 increased from .14 to .29). The 
healthier the climate between the district and the SEA according to 
staff perceptions, the greater the magnitude of local C&I adjustments. This 
strong relationship held up even after the inclusion of all the other 
variables in the model. 

One MCT program characteristic — whether or .not a district: person had beer 
put in charge of MCT-reiated staff development activities — was related to CSI 
adjustments. That relationship was maintained after the addition of other 
variables. 

In the last step cf the regression analysis, the results shewed that the 
three other district adjustment categories were related to Cil adjustments. 
First, where there was a greater acceptance of the test results as an 
important benchmark of success, local C&I adjustments were of a greater 
magnitude. Second, where there was a more frequent flow of communication in 
the district about the state testing program, the magnitude of C&I adjustment 
was higher. Finally, where strategies were more focused to improve test 
scores, more C&I adjustments were made. All of the variables in the 
regression account for half of the overall variation in CSI adjustments 
(R 2 =.51). 

Phase Three interview subjects offered important insights about the 
influence of 'the political climate and benchmark factors. The six 
Pennsylvania districts varied widely in how positively staff members viewed 



32 35 



Table 6 

Standardized Regression Coefficients for Curriculum 
and Instruction Adjustments with Incremental Addition 
of Independent Variables in Pennsylvania (N=185) 



Independent 
Variable 


(1) 


(2) 


(3) 


(4) 


(1) Internal Environment 

• SES 

• SIZE 

• HIGH ACHIEVING STUDENTS 
« PERCENT PASSING 

MCT READING, GRD5 
o PERCENT PASSING 
MCT MATH , GRD5 


-.191* 
-.177 
.097 

-.119 

-. 1-3 


-.111, 
-.159 
.014 

-.067 

-.090 


-.063. . 
-.182*" 
-.029 

-.075 

-.030 


-•030, 

-.123 

-.060 

-.018 

-.136 


(2) State Environment 
• POLITICAL CLIMATE 




*** 

.407 


.231 


. 217 



(3) MCT Program Characteristics 

• VARIETY OF REMEDIATION 
ALTERNATIVES 

• MCT DUPLICATES OTHER 

TESTS 

• MCT ACCURATELY PORTRAYS 
' PERFORMANCE 

• DISTRICT PERSON TO 

COORDINATE MCT 



123 
093 
075 



.074 
.054 
.076 



(4) Other District Adjustments 

• MCT AS COMPARATIVE 

BENCHMARK 

• TESTING STRATEGIES 

• INFORMATION FLOW 



183, 
.153*" . 
► 223 



R 



.14 



,29 



,38 



increment (from 
previous model) 



(.15) 



(.09) 



(.13) 



* 

*** 



p < .05 
P < .01 
p < .001 



NOTE: The sample for the regression is smaller than the full sample because 



of missing data for some of the variables. 



the SEA and the districts' relationship 'with it (i.e., the political climate). 

In one district that had made few C&I adjustments of substance, a central 

office administrator portrayed the situation as follows: 

The community used to hold us accountable. Now we 

have people in Harrisburg [the state capitol]. Who are they to 

think they know what our needs are? The state has 

become someone we have to beat rather than a partner 

to work with. 

In another district where there was a very high proportion of students doing 
very well on the MCT, an administrator argued that it was a "pointless 
exercise" to make C&I changes based on MCT objectives for fear that "a well 
balanced curriculum could be overbalanced to a minimalist one." The climate 
had become hostile enough that administrators in the district had joined a 
battle to exempt the district from the MCT. 

On the more positive side, while there was no outright admiration 
expressed for the MCT program, at least one of the six districts adopted the 
attitude that the MCT could directly help the district. In this system staff 
at one school had gone so far as to write lyrics to accompany the song "High 
Hopes' 1 in an effort to motivate students (and staff) to perform well on the 
tests and to encourage staff to support necessary C&I improvements. Every day 
for a month before the test, students and staff heard the song over the 
loudspeaker and joined enthusiastically in singing it. A sample verse 
claimed: 

We have worked and studied so long, 
Hope we don't get anything wrong, 
And as you've probably guessed 
On the test 

We'll do our very best 
Cause we have high hopes... 

The use of test scores as an important benchmark for comparing school and 

district performance was also viewed from varying perspectives in the six 

districts ♦ On one extreme was an administrator who buried the test results in 
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a bottom desk drawer when they arrived, arguing that the scores created too 
narrow a definition of what should be taught and how students with learning 
deficiencies should be remediated. In the middle, teachers and administrators 
alike shared a concern that the MCT results were being used as "an absolute 
measure of effectiveness in schools 11 . District administrators were quick to 
point out the potential negative consequences of public disclosure of low test 
scores. However, there was also acknowledgement of the political reality of 
needing to address the issue. The comment "We will raise test scores", while 
not stated quite that boldly by everyone, was a refrain in four of the six 
Phase Three districts. On the other extreme was a district where two junior 
high schools with comparable student populations reported slightly different 
test score results (an 89 percent pass rate versus a 96 percent rate). 
Although staff members from the lower scoring school explained that they 
probably took the test less seriously, the community took the difference in' 
scores much more seriously. Enough pressure was created to cause a central 
office administrator to respond: "They T d [the school] better take it more 
seriously next time". 

In response to the finding that an increased information flow was 
associated with greater C3I adjustments, interviewees reported that the most 
useful information was the sharing of test objectives and the process of 
evaluating the match between those objectives and those already contained in 
the district curriculum. Where such information was being shared and there 
was not a great deal of overlap between curricular and MCT objectives, there 
was higher probability of substantive adjustments being made in C&I. 



demonstrates the strength of the high stakes/lo. stakes distinction between 
the two states. A state program had the greatest impact when the scores, or 



Conclusions 



Several important summary points can be made. First, the study 
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passing rates, were a critical ingredient in making important decisions, in 
line with Madaus 1 (1987) original argument. In Maryland, the important 
decision was graduation. However, in Pennsylvania, .public comparisons of the 
scores of schools also increased the stakes by calling community attention to 
variations in school performance within and across districts. This single" 
event in Pennsylvania moved a low stakes program to one with at least moderate 
stakes . 

An important question is: Was this for the better? The qualitative data 
from Phase Three of the study suggested that as the stakes intensified in both 
states, there was a point at which district strategies took cn the flavor of a 
single-minded devotion to specific, almost "game-like" ways to increase the 
test scores. Pennsylvania districts, in particular, that began to take the 
tests more seriously reported that they did so for political reasons and not 
because they believed that they were actually improving their instructional 
program. Prior to this point, the strategies emphasized mere systematic 
changes in the curriculum. Beyond this point, staff began to respond to 
questions about effects with the phrase: "Some good things have happened as a 
result of the tests, but..." Staff members* reservations about the practices 
they were engaging in to improve the scores followed the "but." This analysis 
suggests that a high stakes strategy seems to have desirable consequences as 
long as districts are not put und£r too much pressure. When the pressure tc 
succeed becomes too intense, a turning point is reached and the positive 
effects become overwhelmed by negative consequences. The exact turning point 
would vary from district to district; but it was clear that the test scores 
were beginning to govern activity more directly, as Minzberg (1933) predicted 
could be the case when an organizational outcome increases in importance. 
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That it was the difference in stakes that explained the differences in 
mean scores between the two states rather than simply the length of time that 
the state programs had been in place is supported in two ways. One, all 
indications were that the Pennsylvania means would have risen with the 
commensurate increase in stakes; and two, Maryland informants suggested that 
time likely had reduced the reported means because educators had forgotten 
that current routines were once innovations. 

Second, the perceived political climate between the district and the 
state department played a relatively strong role in both states (see Corbet: 
Wilson (1987) for a discussion of the influence in Maryland) in explaining 
district variations in the impact of the tests cn instruction. Essentially 
the better the communication between an LEA and SEA and the mere the LEA 
believed SEA actions were not politically motivated, the mere likely it was 
that the district would: match local objectives to those on the test, alter 
course content, provide increased and appropriate attention to students with 
learning needs, and report that teachers felt greater pressure to improve tes 
scores. One interpretation of this finding is that this is a "goodwill 11 
factor which is also closely related to positive district perceptions about 
the tests r validity; and the appropriateness of the testing procedures. That 
is, some districts for whatever reason were favorably disposed toward the 
testing program, and this general "good 11 feeling about the program engendered 
a willingness to make considerable adjustments in local operations* Thus 
the historical relationship between an LEA and SEA may outweigh the particula 
sanctions built into specific policies, even under high stakes conditions. 

Third, demographic characteristics played surprisingly weak roles in 
explaining district variations. Socio-economic status (percentage of student 
on free lunch) of the clientele the district served and the type of community 
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served (urban, suburban, or rrral) contributed little to the explanatory power 
of the regression models in Pennsylvania. D.emographic characteristics were 
not totally unimportant , however. Noteworthy was the negative and significant 
relationship between district size and curriculum and instruction adjustments. 
Smaller districts r.ade more Cul changes on objectives and content than larger 
ones . One explanation offered in feedback interviews suggested that small 
districts may have relied on a "textbook" curriculum in the past where the 
instructional program was determined solely by the texts . adopted . Subsequent 
to the state MCT program such districts had to engage in local curriculum 
development to better match instruction with test content. 

Fourth, the findings demonstrated the need to insert a "System Testing 
Program" category into the study's conceptual framework (Figure 1). There was 
considerable district-to-district variation in how accurately local staff 
believed the state MCT portrayed attainment, the extent of remediation 
alternatives, the use of exit criteria for remediation, and whether a staff 
member had been put in charge of MCT-related staff development activities. 
This finding highlights the adaptability of individual districts in terms of 
putting programs into place. Systems interpreted the state program 
differently, a fact of life beyond SEA control. These interpretations 
affected local perceptions of the need, validity, and "burden" of the state 
program, which in turn influenced the magnitude of adjustments made. 

Finally, the findings show the high significance of the original system 
adjustments categories from Figure 1 in explaining district variation in 
instructional adjustments. Several internal and external environment 
variables that were significant factors in early steps in the regression 
analysis for Pennsylvania became insignificant when the adjustment categories 
were added. This supports the idea that district response was not 
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predetermined by its demographic characteristics. Rather, hov the testing 
program was interpreted and implemented locally had the greatest influence o 
hov substantially the curriculum was affected. 

In general, some positive results attended the state testing programs. 
Educators in both states felt their curriculum offerings had become more 
defined; they welcomed the additional information on students; and they 
believed students 1 skills in some areas were improving. 3ut they had 
misgivings as well. These concerns all centered around the use of test scor 
as benchmarks for comparisons among schools and as key measures of system 
effectiveness. Concerns over the validity of the tests and curriculum 
narrowing might have been downplayed except for the fact that student 
performance on the teszs was becoming increasingly important in both states. 
"Getting the scores up" seemed to turn minor concerns into significant 
confrontations between sound educational practices and more questionable 
test-specific ones. 
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